Automatic Extraction Of Chinese Multiword Expressions With A Statistical Tool
نویسندگان
چکیده
In this paper, we report on our experiment to extract Chinese multiword expressions from corpus resources as part of a larger research effort to improve a machine translation (MT) system. For existing MT systems, the issue of multiword expression (MWE) identification and accurate interpretation from source to target language remains an unsolved problem. Our initial test on the Chineseto-English translation functions of Systran and CCID’s Huan-Yu-Tong MT systems reveal that, where MWEs are involved, MT tools suffer in terms of both comprehensibility and adequacy of the translated texts. For MT systems to become of further practical use, they need to be enhanced with MWE processing capability. As part of our study towards this goal, we test and evaluate a statistical tool, which was developed for English, for identifying and extracting Chinese MWEs. In our evaluation, the tool achieved precisions ranging from 61.16% to 93.96% for different types of MWEs. Such results demonstrate that it is feasible to automatically identify many Chinese MWEs using our tool, although it needs further improvement.
منابع مشابه
Integration of Reduplicated Multiword Expressions and Named Entities in a Phrase Based Statistical Machine Translation System
The language specific Multiword expressions (MWEs) play important roles in many natural language processing (NLP) tasks. Integrating reduplicated multiword expressions (RMWEs) into the Phrase Based Statistical Machine Translation (PBSMT) to improve translation quality is reported in the present work between Manipuri, a highly agglutinative Tibeto-Burman language and English. In addition, Multiw...
متن کاملMULTILINGUAL MULTIWORD EXPRESSIONS Literature Survey
Multiword Expressions are idiosyncratic word usages of a language which often have noncompositional meaning. The knowledge of multiword expressions is necessary for many NLP tasks like, machine translation, natural language generation, named entity recognition, sentiment analysis etc. In order for other NLP applications to benefit from the knowledge of multiword expressions, they need to be ide...
متن کاملAutomatic Extraction of Fixed Multiword Expressions
Fixed multiword expressions are strings of words which together behave like a single word. This research establishes a method for the automatic extraction of such expressions. Our method involves three stages. In the first, a statistical measure is used to extract candidate bigrams. In the second, we use this list to select occurrences of candidate expressions in a corpus, together with their s...
متن کاملSyntax and Semantics vs. Statistics for Italian Multiword Expressions: Empirical Prototypes and Extraction Strategies
In this work we present an empirical analysis performed on Italian nominal multiword expressions (MWEs) of the form [noun + adjective] that aims at studying quantitatively their syntactic and semantic features in order to improve their automatic identification and collection. Three indices are proposed, which are able to measure syntactic and semantic frozeness of the expressions on empirical b...
متن کاملTBXTools: A Free, Fast and Flexible Tool for Automatic Terminology Extraction
The manual identification of terminology from specialized corpora is a complex task that needs to be addressed by flexible tools, in order to facilitate the construction of multilingual terminologies which are the main resources for computer-assisted translation tools, machine translation or ontologies. The automatic terminology extraction tools developed so far either use a proprietary code or...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006